UUID Use Cases, pg_settings, Automated Restore, Parallel Future

[00:00:00] In this episode of Scaling Postgres, we talk about UUID use cases, PG settings, automated restore, and the parallel future. I'm Kristen Jameson and this is Scaling Postgres, episode 102 one. [00:00:22] Alright, I hope you're having a great week. Our first piece of content is why I'm not a fan of UUID data type. And this is from Depsz.com and he's basically stating the reasons why he's not a big fan of the UUID data type. Now, this is a controversial topic, as you could tell by the number of comments on this post, but still I found it particularly interesting. So the first problem he says, is that UUIDs are completely opaque. So for example, you have these sequences of letters and numbers, whereas something like an integer, it's much more easier to reason about, to remember if you're trying to reference it somewhere and it typically occurs in order. The second problem, you can't really sort them very well. Now, there are ways around this. He mentions potentially version one of it, but also in the comments they have a link of an article that we actually covered on a previous episode of Scaling Postgres that enables you to have partially sequential UUIDs. The third problem, it's a pretty wide value, definitely larger than most integers you would be using. And then he did a little bit of a performance test to compare the speed of doing a process with integers versus UUIDs. And the UUIDs were more than twice as slow. Although he does note, if this was a more complex table, the difference would be smaller. And then there's a number of comments that discuss this here is this sequential UUID generator? But a number of comments go back and forth discussing this particular issue. Now, I would say I have to agree with them that I'm not a fan of UUIDs. And what I mean by that is that I tend to use integers as primary keys and they only reach for UUIDs in specific use cases. The first use case is if you're going to have a data set that's spanning multiple databases or servers, then I could see needing to use a Uuid. So for example, some of the shared nothing database architectures, they use UUIDs because you're literally generating on separate machines. Another use case is if an actual client is generating a particular ID, having it generate a Uuid that you can then store in the database so the actual record gets created outside the database. Then I could see using UUIDs similarly separate database system, someone was mentioning how they have an on premise solution and a hosted solution, and to make it easier to deal with, they used UUIDs so that someone could easily move back and forth between those situations. And that's listed in the comments here. So, an interesting discussion, but I kind of use it from the perspective of what are the use cases that I would use UUIDs now, related to. This is another article that it's not PostgreSQL specific but it's talking about natural versus surrogate primary keys in a distributed SQL database from the distributed SQL blog. Now this is specifically about a gigabyte but still the kind of things that they mention here is when you would want to use a Uuid and a surrogate primary key is when you're having essentially multiple databases and in this case I believe they're kind of synchronizing between each other. So in the discussion of UUIDs, this was another article this week, even though not specifically about postgres that I thought was relevant. And lastly, another article is generate a Uuid as PostgreSQL default value. So this is a relatively brief post that tells you how to set up UUIDs in your table as a default value. The first way is using the PG crypto extension or the UUID OSSP extension. This is the one that I use and use the select UID generate for the default value when creating tables that need it. So if you're interested in this UUID content, definitely check out one of these three blog posts. [00:04:28] The next article is take advantage of PG settings when dealing with your configuration. And this is from Luca Ferrari at fluca 1978 GitHub IO. Now he's talking about where someone's trying to diagnose something and they ask you what's your configuration? So they just grep say the postgresql.com file to give what the configuration is. But the problem is that's the kind of star configuration not even considering the auto.com file, but what is the runtime configuration? Now he lists all the different downsides of this about omitting the postgresql.com auto file. It includes commented lines and again it doesn't reflect the running configuration. So how do you get the real configuration? He says here you use the PG settings system table so you can do a query as you want to and it shows you the setting, what the current setting is and whether it's changed pending a restart. So you need to do a restart to enable it. So this post is just a quick reminder about the existence of PG settings and you can use it in addition to show commands to see what your current running PostgreSQL settings are. [00:05:40] The next post is automated testing of PostgreSQL backups. So basically this is doing automated restores and this blog post runs through a two different scenarios. So say you're using PG based backup so they go through a scenario of the scripts needed to create a directory, unzip the items into that directory, go ahead and start at the database. Although if you're starting it on an existing system you'll probably want to adjust the port and some additional settings and then do a query to test out that that restore was successful. It had the data that you expected to be in there. So a table count or the last updated at something of that nature to validate that it was a successful restore. And they have a full script here to do that. And then they have a second version using PG dumps. So if you use PG Dumps, it goes through the process of using either psql or Pgristore to reload the files and the whole process to go through. And they also make note of that. Be sure to watch out for triggers because some triggers, particularly if they're contacting another service or some other database system, you want to disable those triggers as data is being loaded. So that's definitely something to keep in mind with PG Dump or PG Restore. And lastly, they cover doing a point in time recovery testing and how you could set that up. So definitely a critical thing to do to verify that your backups are working and successful. And here's a great post that covers how you could do that. And this is from the PG IO blog. [00:07:17] The next post is Parallelism what next? And this is from the Amid Capilla 16 blogspot.com blog. And in the initial set of paragraphs here, he covers what's been developed in terms of parallelism in postgres from 9.6 on through twelve. Then he's talking about what is in progress for 13. The first one is a parallel vacuum. So being able to vacuum indexes in parallel, which could be huge benefit if you have a lot of indexes on your table, because it's the indexes that tend to take the longest time with vacuum in my experience. [00:07:55] Second, Improve explains handling of per worker details that can be very beneficial to try to understand what's going on with explain plans and then avoid unnecessary shared memory writes in parallel hash join. So this could be a good performance benefit in certain cases in terms of what's being discussed for the future is parallel grouping sets. The second is a parallel copy, so being able to load data copy in parallel, that could be advantageous. And then a parallel file foreign data wrapper, probably similar to the work being done for copy to be able to have that operate in parallel. And in the future they're talking about doing more parallel work with things like Gen and Gist indexes because right now it's primarily done for Btree indexes. And then of course there's additional comments that you may want to review as well. So if you're interested in what's in store for parallelism for the future, definitely a blog post to check out. [00:08:54] The next post is Shared Buffers, looking into the PostgreSQL I O cache, and this is from CyberTech postgresql.com and it discusses how you can get insight into what the shared memory cache is storing. And he creates a simple test database to do it. And he's using the extension PG Buffer cache to be able to get that information. Then he shows you how you can expect inspect per database caching as well as inspecting your current database. So if you have a desire to investigate what your cache is storing, definitely blog post to check out the next post is which table should you be auto vacuumed or auto analyzed? Update. So this is an update to a previous post where he basically did a prediction of what is the next vacuum that's scheduled to be run, or next analyze suspected to be run. But that version did not do it per table. It was only consulting the system wide values that are set here, say in the postgresql.com file. This one takes into account table settings as well. So if you've altered a particular table and adjusted the vacuum settings, this script now consults those as well. So if you're interested in this tool, definitely check out this blog post from Dep. [00:10:16] Also from Dep.com is waiting for PostgreSQL 13 add percent x to default prompt one and prompt two in psql. So it looks like what they're referring to here is that they've added some additional notifications on a transaction state. So for example, in the current version of Postgres, when you do a begin, you do a select and you get an error, and then do another select and you're out of the transaction. On the prompt. There's no additional notification, it's just equals pound symbol, whereas in the new version it has additional prompts as notifications to say, okay, we're inside the transaction here, and the exclamation point means we're inside a transaction that failed but hasn't been rolled back yet. So some additional prompts as you're working with transactions that could be beneficial. So if you're interested in learning more, go ahead and check out this post. [00:11:12] The next article is have an eye on locks of PostgreSQL. And this is from Higo CA, and it's a post that talks all about locks. The first thing they mention here is that you can look at what locks exist using the Pglocks system table. Then they go into table level locks and then all the different share types that exist for them. Then they go into row level locks and how they can be shared, covering transaction locks, page locks, and advisory locks. So if you're interested in learning more about locks in PostgreSQL, definitely a blog post to check out. [00:11:50] The last piece of content is actually a tool. Now, I don't normally mention tools, but I thought this was interesting, and it's called PG Flame. It creates flame graphs of your Explain Analyze output, and it looks like this runs on a Mac, or it looks like you can work with Docker as well. But basically you get your Explain Analyze output and it can generate a flame graph for you. So if you want a more visual way of looking your Explain Analyze output, definitely a tool to check out. [00:12:20] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgres.com where you can up to receive weekly notifications of each episode. Or you could subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 59

Popularity, Load Testing, Checksums, pg_hba | Scaling Postgres 59

Episode 64

Leveraging Indexes, Slugs, Addresses, Security Definer | Scaling Postgres 64

Episode 186

Select For Update, PGx Framework, Cool Additions, Full-Text Search | Scaling Postgres 186

UUID Use Cases, pg_settings, Automated Restore, Parallel Future | Scaling Postgres 102