Scalability Limits From SLRU & Lock Manager

[00:00:00] This week we're going to be discussing some scalability limits that some users have run into with postgres, and these are more architectural based limitations. But I hope you, your friends, family and coworkers continue to do well. Our first piece of content is postgres indexes, partitioning and lightweight lock lock manager scalability. This is from ardentperf.com and I like his intro here. Quote I have decided that in postcard circles, I shall henceforth refer to 2023 as the year of the lock manager's revenge. So this piece of content, I think is more than just a blog post. It's a really well researched, dare I say journalistic piece on the lightweight lock lock manager weight events that people have been running into. So you can see all the different references that are covered. So he starts off talking about what is the lock manager? What does it do? Why is it important? And he references a presentation from Bruce Momgium. And basically this weight event that is causing issues is an in memory 16 partition or tranche lightweight lock that protects the lock manager during concurrent access. So basically this limitation is an architectural limitation in postgres. And there are certain conditions where this can become a bottleneck, but only at quote, very high scale. But there are certain conditions that can help trigger that. So if you have a ton of indexes on a table, or do a join with a lot of indexes on the tables involved, or a partition table with a lot of partitions on it. So one of those two conditions coupled by a high number of select queries happening, and this was discussed in a previous episode of Scaling Postgres. I think it was an episode by Postgres FM talking about when you use too many indexes, they discuss this exact issue. He discusses some RDs documentation that has some actual documentation about this lightweight lock lock manager weight event. He discusses some issues that GitLab ran into with this as they were dealing with this. So you can see some information about here. It looks like what might have been driving this is an index count more than partitioning. So quote their project tables contain 57 indexes, so a transaction needs 58 locks to query it. And that's not even considering any joins. Now, some of the recommendation for some of these is adding capacity so you could add more read replicas to spread the query load around, do more caching at the application layer, drop so many indexes from the table, or try to partition your table so you get a lot of partition pruning going on. And also something that Nikolai mentioned in the postgres FM episode as well is that using prepared statements should help to reduce the planning, because that's where a lot of this contention is happening at. Then they talk about a mid journey incident and this was where they were migrating to partition tables and they were doing one per day for a particular queue that they had, but they ran into enormous numbers of locks and he was eventually able to resolve it by, I think, moving to weekly partitions. So not as many individual partitions. And they show how this could be very easily replicated using just these three lines of code in PG bench to show this lightweight lock lock manager contention in action. So this post doesn't have a solution that's present today in postgres to handle this particular limitation. So we have to work around it at this point. So that would mean reducing the query load, if you can, by caching at the application layer, using additional read replicas, or reducing the number of indexes on tables. It seems that using prepared statements could also assist with this as well. So if you're using a pooler that doesn't have prepared statement support, maybe you should go ahead and move to one that has that. And with regard to partitions, maybe you would want your partitions to be a little bit wider than you would. So if you had planned to do daily partitions, maybe you should do weekly partitions. Or if you wanted to do weekly partitions, maybe you should do monthly partitions. But again, keep in mind these are at the limits of postgres scalability. So if you don't have a large super active database with thousands and thousands of transactions going on, you probably won't notice this. But I will say this is an awesome piece of content, and I'm still going to be studying it for a little while, looking at all the different references that were made within it that I still need to check out. But that's why it's the top post this week. [00:04:54] Next piece of content waiting for postgres 17 configurable SLRU cache sizes for increased performance this is from pganalyze.com, and this is Lucas's five minutes of postgres, and he's talking about a patch that was recently released that works with the SLRU caches in postgres, and that stands for simple, least recently used caches. So what's happening is that postgres needs to store different activity on the database unrelated to your actual data, so it's metadata about the data. And he says here an example of this would be using multiple transactions, or using notify or subtransactions as well because of save points, and it does get stored on the disk. But of course this gets cached and the cache is this simple, least recently used cache. And this piece of content talks about people running into a multi exact problem or a subtransactions problem mentioned. GitLab again ran into this issue and basically they made an initiative to try to purge all subtransactions or their use of save points throughout their code to make postcredits perform better. But in 17 there's been a modification where they divided this one cache area into multiple banks. So different responsibilities can you separate banks from another responsibility? So breaking it up makes it more efficient to search through these caches. And in addition to that, they now let you configure and set the size of each of these caches. So this can hopefully alleviate some of the problems that people experienced before. But if you want to learn more about that, I encourage you to check out this piece of content. [00:06:42] Next piece of content PGBouncer 1.22 one has been released and this is a bug fix where some clients were using copy from standard n, and some of these queries could introduce memory leaks, performance regressions, or prepared statement misbehavior. So that's not good. So it seems you should probably upgrade your PG bouncer version sooner rather than later. And we'll check the change log. Okay, it looks like this particular bug was introduced in 1.21, so presumably if you're on an earlier version, you won't have these specific problems. [00:07:20] Next piece of content isolation anomalies in PostgresQL this is from PG IO, and they're talking about transaction isolation levels in postgres. So postgres by default has read committed as its transaction isolation level, and what that basically means when you're in a database session, you can read anything that's been committed to the database. So if you start a session here and in another session here, if a change is made as soon as it's committed, you will see it here. And even if you've started a transaction with begin, you will still see that data change, because as soon as something's committed you can see it. But this particular anomaly is called a non repeatable read. There's also another type of read you can get called read SKU, where you read some data in one session, and then another session changes something and you try to make a change and it's using the other data giving you something you didn't necessarily expect. But if you want to avoid these particular types of anomalies, you can use the repeatable read transaction isolation level, and as this postgres says, you can do it by starting a transaction with begin isolation level repeatable read, and that means you will read the same value no matter what has changed in other transactions that are being committed on the system. But if you have like a read SKU event, you will start to get errors that can prevent transactions from happening. For example, you may have an error, could not serialize access due to a concurrent update. So if you move to a repeatable read level, you will need to be prepared to retry transactions that fail. But even at a repeatable read level, you could still encounter write skew. And the way to resolve that is to move to a serializable isolation level. But like repeatable read, you can also get errors, so it means having to retry transactions. The other big disadvantage of serializable isolation level is performance, because you have internal locking going on to make sure that everything maintains consistency, and there's a cost for doing that. So the default read committed is the most performant, but it does have some of these issues. But if you ever need a higher isolation level, you can change it as this post described. And I will say this particular post, I like the little mini videos that they posted to demonstrate the three different anomalies they talked about nonrepeatable read, read, skew, and write skew. So check out this blog post if you want to learn more. [00:09:57] Next piece of content there was another episode of Postgres FM last week. This one was on rails plus postgres, and Michael and Nikolai had Andrew Atkinson on to talk about rails working together with postgres, and they definitely talked a lot about the orm and active record and the interaction between rails and postgres, and talking about the different perspectives that say, developers have compared to people in the postgres community have. So I thought it was pretty interesting. And you can listen to that episode here, or check it out via YouTube down here. [00:10:34] Next piece of content Psychocop PG two versus Psychocop PG three performance benchmark this is from timescale.com and they did some benchmarking of the new release of Psychocop PG three. Comparing it to two, you can see some significant performance improvements. Like this is almost double in this image here, but they have the individual test cases below. Wow. This example is about four times as performant generate series looks about a little bit more than double. Although this large object psychopg two came out on top here, they look relatively evenly matched, maybe a little bit lower so if you use psychopg two and you're thinking about upgrading to three, maybe you want to check out this post to see what performance changes you can expect to get. [00:11:23] Next piece of content Cloud native PG recipe number one setting up your local playground in minutes. This is from Caprielle Bartolini it and this is a basic setup for setting up cloud native pg in your own environment and setting up your cluster. So if you're interested in learning more about that, you can check out this pretty brief blog post. [00:11:45] Next piece of content tips for tuning postgres like and I like queries via PG trigram on Azure this is from techcommunity Microsoft.com and even though it says on Azure, pretty much this whole blog post is relevant for anyone using postgres. So they first talk about btrees and how you can do a prefix search as long as you create an index with the text pattern operator class. Or there's also the char and Varchar version or care and Vericare version. But if you want to do an internal search then usually you have to get the PG trigram extension which breaks a document up into three character portions and apply a gen index on it to help you get better performance with that. So if you're interested in that, you can definitely check out this blog post next piece of content why I love pSql Postgresql Terminal this is from Database rookies WordPress.com and he has a presentation here that's very animated, showing different BSQL features. That's pretty interesting. He also has the BSQL tips listed as well, so you can check that if you're interested. [00:12:57] Next piece of content comparing postgres managed services, AWS, Azure, GCP and Superbase this is from PeerDb IO and since PeerDB IO, as he says, helps replicate data from postgres to data warehouses, this post might avoid some biases, but he definitely does a comparison of the different platforms and different performance running benchmarks, so you can check this out if you're interested in that. And the last piece of content, PGSQL Friday 16 is coming up and it's tuning that one big query. This is from software and booze.com. So the announcement is this month's blogging event is dealing with, quote, one problematic query and might be even implying that it caused issues with your system. Because, he says, where do you start when trying to dig in to get things running again? I know most of my problematic queries, it's from customers coming to me. It's not like their database is in a dire state. It's just things are slow. We want to make it faster. But if you're interested, you can definitely join this blogging event. [00:14:03] I hope you enjoyed this episode. If you want to get links to any of the content mentioned, be sure to head over to scalingpostgres.com. While you're there. You can sign up to receive weekly notifications of each episode, as well as get the podcast version and a transcript of each show. Other than that, thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 247

Generate Test Data, Faster Archiving, Date Statistics, Useless Indexes | Scaling Postgres 247

Episode 32

Sysadmin Concerns, Power of Indexing, pgbouncer Monitoring | Scaling Postgres 32

Episode 324

Four Million TPS | Scaling Postgres 324

Scalability Limits From SLRU & Lock Manager | Scaling Postgres 306