Boosting Planning Performance | Scaling Postgres 387

[00:00:00] As you start to scale Postgres, eventually you will have to start contending with locks and you may see different kind of lightweight lock states in your monitoring tool or when looking at pgstat activity, or you may be dealing with other lock issues. So it's definitely important to understand the different locks that are available in postgres. So we're going to talk about some quick blog posts with regard to that. [00:00:26] Also, be sure to stay till the end where in my consulting corner I'm going to talk about the conclusion of the summer fall of upgrades, but I hope you, your friends, family and coworkers continue to do well. Our first piece of content is actually a series of four, I guess I'll call them mini blog posts and this is actually from Nikolai at postgresai, where it looks like he's doing another Postgres marathon. I think the last one he did on Twitter and this one is he's doing as blog posts. This first one talks about lightweight and heavyweight locks. So basically lightweight locks are, as he says here, an internal mechanism to coordinate access to the shared memory structures. So it's basically something's trying to read or something's trying to write to the shared memory usually. So basically you can see some of this lightweight lock contention when you look at PGSTATactivity for the WAITE event type. Lightweightlock and the WAITE event type could be related to sub transactions, SLRU cache, or the buffer mapping, or the lock manager itself. So there's no way to explicitly do a lightweight lock. It's just an internal mechanism of postgres. How it works Although how you design your application can influence how some of this works. For example, avoiding sub transactions will help to avoid this type of lightweight lock, as well as not having excessive partitions or indexes on a table can actually help avoid some locking that happens during the planning stage, which we'll look at in the next blog post. And then there are heavyweight locks or generally just locks. And this is where you're locking the databases implicitly or explicitly with a database lock, a table lock, index lock, or even on the rows themselves. Now these are generally held until a commit or a rollback happens. Again, the lightweight locks are all just postgres doing operations. You don't have much control over those. [00:02:21] But you can check out this if you want to learn a bit more about that. Next post he talked about is relation level locks. So these are these essentially heavyweight locks on relations. [00:02:31] So not just tables, but it can also include things like indexes or sequences, toast tables, views, etcetera and he's taking a picture from the postgres docs, talking about conflicting lock modes and showing which different lock modes lock each other. And he makes a part important point here, even though it's talking about row, this isn't necessarily rowlocks, it is still a relation type lock. And then he also includes a reference to a Citus data post that show what different operations conflict with one another. So to selects won't conflict, to insert update deletes won't conflict. And further down he goes into an example of showing these different types of locks and what it looks like from the pglocks table when doing different operations. So for example, doing a lock table, then taking a look at what the pglocks table looks like, what happens when you can do a rollback. And he shows some implicit locking that can happen, for example with indexes. And he also highlights here that during the planning phases of a query, postgres locks all tables participating in the query and all indexes of those tables with an access share lock. [00:03:42] And this gets referenced more a bit later. [00:03:45] So if we look at the next one, he's talking about the roots of lightweight lock lock manager. And this is a shared area to maintain all of these essentially heavyweight locks. And there was an enhancement added to postgres to actually partition this lock manager into 16 partitions. But lock manager contention was still happening. [00:04:07] And then another enhancement was released in 9.2, doing fast path locks. And if you look in pglocks you can see fast path equals true for those. And that's explained more so in this next post, fast path locking explained. And what essentially is as he mentions here, instead of always going to share the memory, each back end, so each process in postgres that's running statement gets its own private array to store limited number of weak locks, extra share lock row share lock row exclusive lock. So this locking is maintained on each back end and as it says here, exactly 16 slots per backend. So they can have these very weak locks and create them and release them very quickly, which are needed when doing even simple selects against the database. Because as he mentions weak locks on unshared relations, the ones I just mentioned don't conflict, but as a consequence of this is that if a heavy lock is needed, it actually has to check each back end to make sure there's not an existing share lock on these resources before that process does the work it needs to do. So basically the stronger locks are still stored in lock manager and it needs to be checked before one of these is Granted, but these weaker locks are stored in the backends themselves. But the consequence is when a stronger lock needs to happen, it actually needs to check each back end to make sure something's not currently locked. But even the 16 slots still aren't enough, particularly when you're talking about a lot of indexes or a lot of partitions. So it started causing contention, particularly during planning time, because each of the indexes and tables used in a query needs to be locked. But that changed in Postgres 18, where there are no longer 16 slots, but now it's customizable and it is also based upon the max locks per transaction. [00:05:57] So he shows an example here using PGBench where he set the max locks per transaction in 16 and 128 and ran both of these PGBench runs. And you can see this is mean execution time. The execution time is about the same, but you can see the planning time explodes right about where it runs out of those 16 slots for fast path locking on the back ends because you have 15 extra indexes plus the table and and that causes it to have to do slow path locking or non fast path locking. So it basically has to start using that central lock manager and you can see how worse it gets as you increase the number of indexes that are present. But when you set max locks per transaction at 128, you can see a linear growth. In the meantime, there's not this explosion of essentially process contention as it's dealing with locking. And I should add this is an enhancement that's in Postgres 18 where Max locks per transaction increasing. It actually increases the number of slots available for this fast path locking. And you can see all the wait times here when max locks per transaction are 16 is related to the lightweight lock manager. So it's basically lock manager contention is what's causing this explosion in the meantime. But if you want to learn more, definitely check out these blog posts. [00:07:20] Next piece of content. Data Safety on a budget. This is from ardentperf.com and he's talking about how probably best practice with regard to keeping your data safe is storing three different copies of it. But the problem with that is it gets super expensive because he made up a scenario where you have three different environments, you have production, say development, say testing, and maybe you want to run your data set in two different regions and each of those clusters. He's using cloud native PG as an example here has three database systems, so data would be copied in each of those systems on the disks and then you're using object storage for doing backups. Not quite sure what he means about the persistent storage using CEPH volumes. I'll have to look more into that. But basically you now have crazy storage amplification and and maybe you could get by storing this data once on a thousand dollars enterprise E drive, but now you have to spend $96,000 on the disks alone for all of this data. And he says even though this provides a lot of safety, maybe you can trim it down and do what the blog post says here is still have data safety but on a budget. And he actually proposes several different methods here, storing usually at least three copies of something in different locations to make sure you can get the data back. So I thought this blog post was interesting from the perspective of considering the cost of the storage as well as how many copies of the data you're keeping. Next piece of content exploring PostgreSQL to Parquet archival for JSON data with S3 range reads this is from Shayan.dev and he had an issue where they were storing a lot of JSON payloads in postgres and eventually they would I guess expire and they would need to be deleted, which would cause tons of vacuum issues. So they started using partitions but they still wanted to keep some of the data around. So we actually explored storing this data in Parquet files on S3. Now he also talks about toast in general and how when you are storing large JSON blobs, essentially if you want to make one update to the that blob, the whole thing, the whole row gets copied again. So it does cause a lot of bloat in the table. But most of this blog post is focusing on ways to take this data and store it in a cold tier, as he calls it in Parquet files on S3 he showed how he mapped over the data to it and he actually is storing data in segments so he doesn't have to store a whole huge file. But because of all the breakout he is having to maintain a catalog and how things are recorded in these individual files to be able to find the data. So it is definitely a homegrown storage archiving system that he's developed here. But as I was reading this, I was thinking, you know, I Wonder how if DuckDB could assist with some of this if you're storing data in parquet files. And he actually did mention down here that quote there are services like DuckDB that allow you to query the same nature of data using SQL as well. And that's something he plans on looking into. [00:10:35] But check this out if you want to learn more. Next piece of content CCNPGRecipe22 leveraging the new supply chain and image catalogs this is from gabrielebartolini.it and Cloud Native PG has actually started using a new system for storing their postgres operand images and they say it helps with vulnerability scanning image signing software, bill of materials for each image as well as optimized multi arch builds. So they're basically trying to build a better imaging system to use with cloud native pg. So if you want to learn more about that, you can definitely check this blog post out. Next piece of content testing cloud native PG preferred data durability this is from ardentperf.com and this is kind of a follow up to the cost post that he did where he's actually looking at can you move from say required data durability to preferred? And he ran a few experiments just to show the relative data loss with regards to different parameters. And you could see required with two instances he had 0% loss, whereas preferred based upon if you had two or three instances you had some runs showing a 48% loss, some runs showing a 4% loss with the three instance sizes as well as seconds between kills and failover delay can also impact the data loss as well. So maybe this can help you determine if you wanted to consider running cloud native PG in its preferred data durability mode. Next Piece of content There was another episode of postgres FM last week. This one was on user management. [00:12:10] In this one they described how postgres basically has its own users that are independent from the operating system users, and sometimes that can be a point of contention, although you can use, depending upon how you configure it, Operating System Users to have Database User rights they talked about the part where roles are the same thing as users, so a role can be a user or a group for other users. Nick mentioned the importance of alter default privileges for setting up permissions for new users coming in. They also covered ownership, so objects have their own ownership as well. [00:12:46] And I'll throw in my 2 cents. Normally when a development team is just getting started, usually it's sufficient to just have one database user that is the owner of the database and that user runs the migrations. It's usually the application user, so that's pretty much the simplest way to go if you want to be a little bit more secure. You could have two users, one of them does the migration so it can create drop tables, create indexes, whereas another user just basically does the DML operations insert, Update, select Delete. And you may have additional users that maybe want to just read all the data so you can give them the PG read all data role. But apart from that, once you start to get into larger organizations with more SREs or site reliability engineers working in the database, then it's definitely more important to have more consistent or more complex user management and to close out the episode. They did talk a little bit about passwords appearing in logs and and how you definitely want to be careful how you set passwords and reset them because a lot of times they can just show up in the logs, which is not great. But you can listen to the episode here or watch the YouTube video down here. [00:13:57] And the last piece of content, there's been a webinar that's been posted on the Pganalyze YouTube channel called Hands on Postgres18, asyncio, B tree, skip scan, uuidv7 and more. So if you can't get enough Postgres18 content, you can definitely check out this video and now it's time for my consulting corner. So this week I actually closed out the summer or fall of upgrades and actually had two final upgrades that happened this week. One, I was up till 3:30am the next I was up till just after midnight. But thankfully both of them pretty much went well and everything was successful and the systems have been smooth thus far, so definitely a great relief. And I thought I should mention, you know, even though I talk about doing these, this is really a team effort. There's multiple people involved doing different steps of the process because that's the thing. Every environment is different. They're using different tooling, they're using different monitoring, they're in different environments. So unfortunately there's not really a one size fits all solution. But I will say a few things that do help you have a successful upgrade or migration or whatever you're working on is definitely script as much as you can, so leave the human interaction to as little as possible. [00:15:18] And two have many dry runs so that at the point where you actually do it, the upgrade is essentially a non event because believe me, that'll definitely reduce your stress levels. But other than that, I hope your upgrades or migrations go well in the future. [00:15:35] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content discussed as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show as well as a full transcript thanks. And I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 135

Connection Scalability, Benchmarking Performance, Community Impact, zheap Progress | Scaling Postgres 135

Episode 324

Four Million TPS | Scaling Postgres 324

Episode 287

15% - 250% Faster Query Throughput | Scaling Postgres 287