Multigres Architecture | Scaling Postgres 418

[00:00:00] I always want to try to keep up to date with what's going on with sharding. Like for example, pgdog is a sharding pooler and the other project I've been tracking has been Multigres, of course. Well, they released about four blog posts this week going into more detail about some of their architectural decisions. So we'll be covering that, but I hope you, your friends, family and co workers continue to do well. [00:00:27] Our first piece of content is actually an architectural diagram before we get to the blog posts. And I think seeing a visual will help you understand the architecture of the system before we jump into some of the blog posts. But in terms of multigres, they basically have designed what they're calling a multi gateway. So this is essentially the gateway into the multigres sharding solution. [00:00:52] There is also a multipooler and the multi poolers exist on the Postgres instance. Now, the intention is for this to operate within Kubernetes, but basically on the host itself. You would have the multipooler running as well as your Postgres system running. And the multi pooler has all the connections, but it's the multi gateway that actually passes the application connections through to the pooler. So they actually have separated these responsibilities in PgBouncer. They're identical essentially. [00:01:26] And of course it has other components explained here. When you look at a multiple database deployment, basically you can have multiple gateways and of course you have as many multi poolers as you have Postgres instances. [00:01:39] So there's two gateways and two systems. And then they have an even more complex diagram here where you have typically three instances each with a multi pooler and you can have one or more multi gateways that interact with them. So that's a basic architecture. Let's jump into the first blog post, which is two jobs, two processes. Why multigrass has its own connection pooler and the first thing they talk about is what is multigras? Well, it's basically Vitess, which is the sharding solution for MySQL. So it's basically Vitess for Postgres. And like so many other choices, they've decided to make their own pooler and not use PgBouncer. It feels like this is the third time I've covered people not using PgBouncer in blog posts over the last month or so. But basically all the changes they wanted to make to it, it would really be a fork of it. And do we really need another fork of PgBouncer? No. So they actually built their own pooling solution and actually split it into two services, as they mentioned here. And their core objective is to bring sharding to Postgres, and they want it to work so that any single client connection has to reach any Postgres instance in the cluster. Aggregation happens at the top. And both of those things should essentially exist in one process. [00:03:02] Something that accepts client connections, parses queries and decides where they go. And that's the multi gateway. So all the applications talk to multi gateways. You can have one, you can have two, three, however many you need. [00:03:14] Then you have a per Postgres instance connection pool. So that's the thing that holds the backend connections open to Postgres, that is the multipooler. And it exists on the instance upon which Postgres is running. So if this is a Kubernetes cluster and you have three hosts with a Postgres database on each of the hosts, you would need a multi pooler on each of those hosts holding those Postgres connections. And the other component, which they don't talk about too much here, is the multi orchestrator that handles all of the high availability and the coordination of the different services. [00:03:47] Now, they do admit there are some disadvantages because there's now an extra hop on the query path, basically gateway to pooler over gpc. And of course their pooler does a few extra things compared to a vanilla pooler. Now, some other things they decided on is to use per user connection pools. So this is set this way predominantly for security reasons. And secondly, you can choose to pool without choosing a mode. So in PgBouncer, when you start it up, you generally choose whether you want it to be session based or transaction based usually. But multigres is smart enough to figure out on its own which mode of behavior a given query needs. And I think there's a blog post that talks more about that in detail. And they also do prepared statement consolidation across gateways, and that's in the blog post. One parse per query, no matter how many gateways. [00:04:41] So that leads us to the second post. Per user pools that share fairly. [00:04:46] So each user gets their own pool and the pool's back end connections are authenticated as that user. There's no impersonation that can happen with other pooling products. [00:04:55] And the main reason they did it was for security. You know, it makes security audits easier, but the cost is you have to share a connection budget. And they're using a maximin fairness algorithm essentially to decide how to handle this. And they show You a visual representation basically how that works. But the reality is I've only seen for large applications only a couple users usually. So maybe it's not so bad of a disadvantage. But even if you have the same user, you could also have different sessions for them if they're using session variables. Because if you remember the solution tries to interpret what queries are happening and does this particular connection need a session and it needs to see that session again when it comes back from the application. So then they discuss how that bucketing is kind of happening and they discuss some of the costs related to that. [00:05:50] Next blog post, pooling without choosing a mode. This goes into that in more detail where the system essentially looks at what queries are being sent and decides to whether to do. I think just transaction mode and session mode and based upon things outside of typical transactions that would happen. Maybe you have temp table creations copies portal, which is basically a prepared statement bound to specific parameter values. [00:06:17] Listen for Postgres, listen notify capabilities. Well, they basically rolled these up into what they call three pool tiers. So you have the regular pool which is pretty much doing transaction pooling. So there's no reason to reserve the connection. It can immediately be given to someone else. It's recyclable and most queries get served here. Then they have a reserved pool that connections have some reservation reason like was mentioned. Maybe it created a temporary table for example, and this is pinned to a single client. But they can move back to the regular pool when all the reasons clear. [00:06:53] So it sounds a little bit like AWS's RDS proxy. If you use session variables, it's going to pin the connection. It sounds like it's pretty much the same thing. So you don't want every connection to be using session variables because then everything is essentially pinned. So you do need to take care with how you're using this. And then they have an admin pool for administrative actions like terminating the backend. Then they mentioned some of the trade offs. There's more state state that's maintained in the pool. The detection has to be exhaustive and the regular reserve split is a config time choice like how much connection budget do you assign between them? And then the last blog post is one parse per query no matter how many gateways. [00:07:41] So prepared statements are great. The problem is if you have multiple gateways that clients are talking through. Ideally they wanted to fix it in one layer, but they actually had to do it in in two layers at the gateway and the multipooler. So that's great if it works as intended. [00:08:01] So those are some recent updates for multigres. One thing I did observe is that this seems to be a very architecturally heavy sharding olution compared to pgdog, which seems relatively simple. You have the one sharding pooler, but presumably that offers more benefits. As long as it's not so architecturally expensive it becomes difficult to maintain. But if you're interested in sharding, definitely check out these four Blog posts Next Piece of Content PG Backrest Will Continue this is from PG backrest.org and this is David announcing the project is continuing, which he alluded to on GitHub. Well now there's been a [email protected] and he mentions his sponsor which includes AWS, Supabase, PGEdge, Tiger Data, Percona and Eon IO. [00:08:56] So that's awesome. He got that many supporters for his solution, so that's great. [00:09:02] Next Blog post Keeping PG Backrest open, healthy and Community driven. This is from percona.com and they basically described why they're supporting and the need for the communities to support great open source projects like PG Backrest. So definitely feel free to check this out. [00:09:20] Next Piece of Content There was another episode of Postgres FM last week. This one was on PGflight recorder. So David Ventimigilia, my apologies if that's incorrect, joined Nick and Michael to discuss pgflightrecorder, which is another monitoring tool that actually runs inside the database to track various statistics on it. And it is using Nick's method of how he built PGQ with multiple partition tables that collect the stats and it uses snapshot imaging to collect the data. And David does work at Supabase, so this seems to be a Supabase supported project, which is great. And originally he just said I wanted to do a simple on the database way of tracking wait events, but since that time it's grown and grown to cover more and more information from various system views across the database. And you can install it on hosted platforms because it doesn't need to be an extension, although I think he said it can be installed as well via PGtle. [00:10:26] But if you want to learn more about it, go ahead and listen to the episode here or watch the YouTube video down here. [00:10:32] Next Piece of Content why giant in clauses slow down your app? This is from tigerdata.com and they're talking about a scenario where you have an end statement with a lot of characters in it and orms love to do this and they do this because you can't really do joins in Some cases, depending upon how you want to render the data. And this is the easiest way to do it. The problem is when you have thousands of IDs or text strings in that end list. The Postgres parser has to do a lot of work. It has to tokenize every single constant in the string, assign types to every element, and build a node tree for each item. But there's a more efficient way to do it. You can use an any array. So basically you use any and then you define an array inside of it. Here's an array of integers, and this is far more efficient. Like you can see, planning time dropped dramatically in this example, it dropped from 40 milliseconds down to 2 milliseconds. [00:11:36] You have a significant reduction in memory required to store the query string, and the planner is more likely to use a bitmap index scan. Now, of course I'm looking this and I wonder if this could be a planner optimization of Postgres. So if it sees an in clause, why not just convert it and use it as an any clause? Then it wouldn't require rewrite of the query. But basically be aware that this is a potential optimization that's available to you. [00:12:06] Next piece of content Egress problems and where to find them this is from planetscale.com and he's talking about the issue of downloading data. And if you have a database system and you're bringing data across the Internet, you want to make the data you're querying for as small as possible. [00:12:23] And in general, this just helps performance. So they use an example of this large table here, and you're wanting to pull data back again. Don't use select star, use select with specific columns. So the less data transferred, the faster it will be. And of course the less expense if you have to pay for egressive data. And if you have to pull out specific content from say a JSON B field, think about extracting data from that JSON B to pull it over to your application, as opposed to pulling back the whole JSON blob. But if you want to learn more about that, check out this blog post. Next piece of content partition merge split once more with locking. This is from thebuild.com and as he mentions here in Postgres, 17, merge partitions and split partition features were reverted before 17 was released. However, they're back in 19, but now they have a heavier lock on them. [00:13:20] So you can now merge partitions or split them, but there is a heavier lock on the parent table for the entire operation. And that heavier lock is an access exclusive lock. So no select or no insert or anything basically. So at least we have it and maybe you could use it during a downtime of sorts and maybe in the future we'll get one that doesn't require such a heavy lock. [00:13:47] Next Piece of Content welcome to order by Jungle. This is from boringsql.com and he goes all into order by and what happens if you order by a negative amount or negative column and all the different ways you can order by and how it works. [00:14:03] So if you want to learn more about that definitely check out this blog post. [00:14:06] Next Piece of content xid wraparound's equally evil twin this is from richien.com and he's talking about we have transaction id wrap around we need to be aware of, but we also need to be aware of multi transaction ID wrap around or multi exec ID wrap around and these get generated when you have a single row being locked by multiple transactions. So this is say typically select for share locks or even foreign keys can generate them as well. And he said much like the transaction ID settings, there are comparable settings for multi exact transaction IDs and be sure to monitor not only your transaction ID but also your multi exact transaction IDs to make sure you don't hit the limit. The limit is the same as transaction IDs but interestingly it actually freezes at a different amount. I think it freezes at 400 million as opposed to 200 million but it gives two queries to be able to monitor their current age, which I've seen a lot of monitoring tools don't monitor this. So this is something to check periodically, especially if you're using for share selects or you have foreign keys in your database. And here are exactly his recommendations. Add multi exact monitoring Watch your foreign key parent tables. Consider adjusting your multi exact freeze max age if necessary. And don't ignore what you think might be unnecessary vacuums because it could be doing multi exact freezing and Last piece of content how we save 40 gigabytes of Postgres space wasted on indexes. Why? Because they were future proofing their index decision. [00:15:50] So this is basically proving the point that indexes are not free, they take up space. They can actually impact your insert or update transaction performance. [00:16:01] So only use the indexes that are absolutely necessary for your application because they were having some performance issues and he narrowed it down to actually indexes they weren't even using and one that could be repurposed in the information captured by caching as opposed to requiring an index on the table. Oh, actually, sorry he mentioned here the first four indexes were unused, so these were unused, which was the vast majority of the space. [00:16:29] So definitely don't create indexes you don't need. I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find all the content mentioned, as well as sign up to receive weekly notifications of each episode. There you can also find an audio version of the show as well as a full transcript. Thanks. I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 237

Postgres 15 Released, File Systems, Connection Pooling, Secure Connections | Scaling Postgres 237

Episode 28

Watch, Fast Text Search, Column Defaults, Import CSVs | Scaling Postgres 28

Episode 11

Serializable, JSON & JSONB, Fast Data Loading, PG11 Features | Scaling Postgres 11