4 Times Faster UUID Primary Keys | Scaling Postgres 368

[00:00:00] For many years we've had debates on whether you should use UUID primary keys or integers. And in terms of performance, the integers or the big ints always won out over UUIDs for a variety of reasons. [00:00:16] But a number of years ago people started talking about time based UUIDs. [00:00:21] This is where the front portion is time based, but the back portion is random. [00:00:26] Now this changes the game in terms of performance for UUIDs, and we're going to talk about some benchmarking that someone did with regard to that. And be sure to stay till the end of the show for my consulting corner segment where I talk about different ways to convert a table to a partition table. [00:00:45] But I hope you, your friends, family and co workers continue to do well. [00:00:50] Our first piece of content is is PostgreSQL UUID Performance Benchmarking random version 4 and time based version 7 UUIDs. [00:01:00] This is from Dev 2 by Umang Sina. [00:01:04] Now this blog post is quite long, but it is very comprehensive, talking about the reasons why UUID version 7s are better than version 4s in terms of performance and then also doing a detailed benchmark showing you the actual performance differences. So first he explains UUIDs and their different versions and basically the random UUID version 4 has most of its bits dedicated to randomness. So if you're definitely looking something to be the most random, still use version 4. But what version 7 does is it sacrifices the 48 bits of randomness that's at the beginning of UUIDv4 and uses that to make a UNIX timestamp at millisecond precision, and then the remainder of the randomness still remains. Now why this is important when you're talking about primary keys is that you get a relatively consistent order of the data being inserted compared to a totally random uuid. Because that random uuid, when you do an insert, it's going to be placed all over the disk. You're probably going to be having a lot of page splits, and it will also lead to a lot of full page image rights in your wall, significantly increasing your wall activity. [00:02:22] And he looks at some of the comparisons down here between them. So in terms of insert locality, version 4 is very poor because it inserts them entirely randomly, whereas the time based Uuidv7 tends to be close to append only in terms of page splits. Those rarely happen with version 7, whereas version they are frequent. With version 4 you have a high amount of index fragmentation, with version 4 it's low. In version 7 you have a high write amplification with version 4, it's low. In version 7, the disk I O with regard to version 4 is random, whereas it is mostly sequential in version 7. And then he says a performance under high load, it can degrade over time with v4 because of all these other problems above it, whereas v7 tends to remain more stable. But I didn't see a lot of details he mentioned with regard to this latter one. But let's get into his benchmark. He's looking at an eight core system, 16 gigabytes of RAM. He created two tables with Uuid primary keys and just a text field to store data. And you'll notice he's not using any functions within postgres. So he's actually generating the UUID in a Go program, keeping it in memory. And he's inserting 10 million rows, 10,000 row batches at a time. So he's trying to keep everything in memory to really analyze the performance of the database. So he's not using any of the functions in postgres to do the UUID generation. [00:03:52] So here's the code that he uses, and in terms of the insert performance, version 4 inserted in 5 minutes and 35 seconds, whereas version 7 inserted in 3 minutes and 38 seconds. So not twice as fast, but still really significant performance improvement. [00:04:09] Next he checked on the disk usage, and as you would expect, version 7 is slightly smaller by around 200 megabytes or so out of 3.6 gigabytes. Now, I should say this is total relation size he's looking at, which would include the indexes. And because of that, it looks like most of the size difference is due to the indexes, because. Because the index size difference is pretty close to the total table size difference. And then he looked at the query performance of this table, and in terms of execution time, it was 4 times faster for the UID version 7 compared to version 4. Just looking up a single key in the table. [00:04:50] Now, the planning was significantly slower for version 4, although I don't know how consistent this is with benchmarking, it would always be cool to have repeated runs and then see some sort of a standard deviation with it. The reason I don't quite know if this change is valid is because when you look at doing a range scan pulling out a range of IDs, the planning time was actually faster in the version 4 compared to the version 7. But in terms of execution time, it was twice as fast pulling out a range of IDs compared to the version 4 random UUID. [00:05:24] So it has the set of conclusions down here. So in terms of version 7, the insert time was 35% faster, the table size was 5% smaller, index size 22% smaller, point lookup latency was four times faster, and the range scan latency was two times faster. And this is basically why people have always pushed to keep using integers for primary keys, because these incremental identifiers definitely improve performance of postgres for all the reasons mentioned. But now that we're going to be getting version 7 UUIDs and Postgres and Postgres 18, this definitely gives a strong alternative to using integers for primary keys in terms of performance. [00:06:06] But if you want to learn more, definitely encourage you to check out this blog post. [00:06:10] Next piece of content OpenAI Scaling PostgreSQL to the next level this is from pixelstech.net and this was from a presentation given at pgconf dev 2025 talking about how OpenAI uses Postgres. And what's interesting is that they've used a single writer and many replicas for their scaling solution. So they say here we've used a single primary instance in Azure without sharding for a long time until we encountered write scalability limits. But I didn't see them mention sharding, at least in this particular post about the presentation. But they did mention some problems they had with their scaling, particularly with having so many read replicas, I think up to 12 to the point that network bandwidth was becoming a conflict. And also they had issues with optimizing vacuum as well. So they did a number of things to try to optimize their writes because they were working with a single primary. [00:07:07] So basically trying to write less or using lazy writes to smooth out write bursts and controlling the frequency of writing during backfilling. So they're just trying to write more efficiently. And they did things to set different timeouts like idle and transactionsession timeout statement timeout trying to block queries that were taking too long. And they also kept a control on queries and tried to minimize too many joins that were happening for particular queries. [00:07:35] So a lot of this seemed like a typical scaling experience. [00:07:39] So I didn't get a lot new out of this, but I thought it was just interesting seeing how OpenAI is doing it now. But if you want to learn more, you can check out this blog post. [00:07:49] Next Piece of content There was another episode of postgres FM last week. This one was on locks, and more specifically they're talking about heavyweight locks that you can set at a table or row level. [00:08:01] And one thing that Nikolai started off mentioning is that if there's one thing you need to know about locks, they don't get released until the transaction is complete. So if you're going to start a transaction and lock something, know that it will be locked until the end of that transaction. For that reason, you want to keep your transactions as short as possible to avoid locking things for so long. And they mentioned how some people say, well, can we get along without locks? But the thing about it is, if you want a consistent database, it requires locks. You have to lock something to let someone change it and someone else can't lock it at the same time. Now there are other databases that are more distributed that work on the concept of eventual consistency. So there the consistency isn't strong. [00:08:47] So you can avoid locks that way. But if you want something consistent, you're basically going to have to lock things. [00:08:53] And there's all sorts of different lock types. But the way that Nicol and Michael like to describe it is that you basically have shared and exclusive locks. So shared are generally used for reading. You can have many of them, whereas exclusive locks are for writing and only one exclusive lock can be on an object at a time. And then within those you also have table level and row level. In terms of some recommendations, Nikolai also mentioned you should enable this setting for log lock weights because that does give you more information in your logs to try to understand what locking patterns are happening. There's also the great function PG blocking PIDs, so you can see what process IDs are blocking a particular process ID. So you can see kind of sort of like a lock queue and what's happening. Finally, they did go over deadlocks and how to resolve those is basically make sure you are processing data in order to, because deadlocks are a result of order inconsistency with processing something. And they also talked about advisory locks, which you definitely want to use those sparingly. And frankly I've never used advisory locks in my application development, but I tend to just rely on row locks because excessive use of advisory locks can lead to lock contention if you don't keep fast and tight control over them. [00:10:16] But if you want to learn more, you can definitely listen to the episode or watch the YouTube video down here. [00:10:21] Next piece of content. Big problems with big N lists with Ruby on Rails and PostgreSQL. This is from andyatkinson.com and actually this is not just applicable to Ruby on Rails as well, but pretty much any framework has the ability to produce these N lists. But I actually haven't seen big endless cause too much of a problem. [00:10:43] I mean they can definitely slow things down in terms of planning and execution, but not that significantly, at least for the clients that I've been exposed to thus far. [00:10:53] But he says you can create this pattern explicitly if you run a query and basically pull in an array of IDs and then use that array to say find the books where the author ID is in this array, that will essentially send a large N list to the database. [00:11:12] But your orm again not just Ruby on Rails, I've seen this in Python, can create this pattern implicitly. [00:11:19] So for example, in Rails if you use includes or preload it actually sends two queries to the database, for example one to get the list of books and then if you include the author it's going to send another query looking at the authors table with all the inquiry with essentially all the author IDs from the first query. [00:11:39] Now you can do some interesting things with the data in terms of having two queries like this, but it definitely can result in hundreds or thousands of IDs being inserted in here or tens of thousands. So if you are hitting a performance bottleneck with this, you will want to probably convert that to a join. Now you can use the join syntax or you can use the eager load syntax in Ruby on Rails, but I'm sure there are ways to do that in other ORMs as well. So you can see here it just converts it to a join. [00:12:08] He also talks about some other techniques that you may want to try using any instead of in. This could lead to some efficiency because essentially you're using more so an array or you could use the values keyword because as opposed to a list of scalars it's more considered an array. Now on earlier versions of Postgres I have seen this definitely give performance improvements, but I think with more recent postgres versions I don't know if it stopped being the case, but it's not as significant. And he mentioned improvements in 17 and 18 that can help in certain cases. [00:12:40] But if you want to learn more about this you can definitely check out his blog post. Next piece of content open data standards postgres, otel and iceberg this is from supabase.com and I thought this was a good, I guess thought leadership post about the state of the data landscape in terms of application development. [00:12:58] He talks a little bit about how postgres has been so successful in terms of an open source project and then talks about three different data Personas. So one, you have OLTP databases that developers use to build apps. You have telemetry to understand what's going on in terms of those applications. And then OLAP and data warehousing, which is used by data engineers or scientists to draw insights. And to a certain extent, this is an organization maturation as well, because you generally start with your application on a database, then you need some telemetry behind it, and then at some point it becomes large enough that you need to actually start using online analytical processing and data warehousing as well. [00:13:38] And he looks at three open data standards. He says, you know, OLTP is Postgres nowadays, usually in terms of telemetry, that's the opentelemetry standard. And in terms of olap, it's the Iceberg standard. And you can see how multiple organizations are using the OTEL standard, from Datadog to Honeycomb to Grafana and Elastic. And multiple organizations are using the Iceberg OLAP file or data warehouse standard in terms of databricks, Snowflake, Clickhouse and DuckDB. And he says it explicitly right here. Postgres is the open OLTP standard. OTEL is the open telemetry standard, which is basically in the name. And then Iceberg is the open OLAP standard. But he also throws in S3 as well, because he's seen it as the ultimate data infrastructure standard, in other words, a way to store data. And a lot of people are using it particularly to store data warehousing files. And AWS is making enhancements such as S3 Express, which is 10 times faster than standard S3 and actually just became 85% cheaper. And eventually maybe Postgres is going to be writing and reading to S3 to a certain extent now he says there will always be disk there. But I remember Aureole ADB, I think, showed an example of Postgres operating against an S3 data store. So there was no local disk. So I just thought this was an interesting thought piece and if you want to learn more, you can definitely check it out. Next piece of content backfilling Postgres, Toast columns and Debezium data change events. This is from morling.dev and he's talking about Debezium, which is a change data capture utility, to actually read the wall stream from Postgres to extract data changes from there. So basically the use case is to transfer data from your operational database into data warehouses, data lakes, or search indexes, typically with sub second end to end latencies. But he says there are a few quirks about that, particularly with regard to toast columns, because when an update happens if the toast is not modified, it does not send it over again in the wall stream. So basically you have to go in and backfill what that value is. So this blog post discusses that in definite detail. [00:15:56] So if you're interested in that, definitely encourage you to check out this piece of content. [00:16:00] Next piece of content, Cloud Native PG 1.2.6 and 1.25.2 is released and it looks like the new features in one. 26 are the declarative offline in place, major upgrades and enhanced startup and readiness probes for replicas and declarative management of extensions and schemas, which I find is really interesting in terms of maintaining extensions with regard to cloud native pg, but feel free to check this out if you want. And the last piece of content, Amazon Aurora DSQL is now generally available. This is from aws.Amazon.com so this is a distributed SQL database with active active high availability and multi region strong consistency. So from my understanding, I believe this is an eventually consistent database. So it's not perfect consistency, but they're saying it's strong. But it definitely gives you a lot of scalability and availability with that type of architecture. So feel free to check that out if you're interested. [00:16:58] And in terms of my consulting corner, we're going to talk about some different ways you can convert a large active table into into a partition table. Now I'm working with a client that is currently deleting a fair number of rows on a regular basis from some tables, but it would be ideal. Cost less locking, cost less bloat to go ahead and convert those tables to partition tables. So we're looking at some different avenues on how to do that, hopefully with minimal downtime. Now, one technique you can do is have everything in a default partition. So you would create your partition tables and then attach the existing table as a default partition of that partition table and then migrate the data over in a piecemeal fashion into individual partitions. [00:17:44] Now, PGPartman provides a utility to help do this if you're using PGPartman, but I'm a little worried about the downtime aspect, but that's still something we're looking into. [00:17:55] Another solution is creating a mirror table. So basically you create a partition table, send rights to both tables, and then backfill the data from the original table to the partition table and then make sure everything's in sync and then cut over in one transaction to the new table. That's something that we could do. I've also seen people using views so you have a view that talks to the original table, but the view has the name of the original table. [00:18:25] Then you set up the partition table as like a mirror and then you redirect the view to the new one to the partition table. But I have some concerns about that. I'm not sure we're going to be considering that option, but those are some of the solutions we're evaluating. So I'm wondering what have you done? If you've converted a table into a partition table, what technique did you use and what tools did you use to do it? Feel free to let me know in the comments. [00:18:51] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find all the links to the content discussed, as well as sign up to receive weekly notifications of each episode. While you're there. You can also check out an audio version of the show as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 109

Show Plans, WAL Monitoring, Using UUIDs, Default Logging | Scaling Postgres 109

Episode 135

Connection Scalability, Benchmarking Performance, Community Impact, zheap Progress | Scaling Postgres 135

Episode 335

Postgres 17 Released! | Scaling Postgres 335