New Scale Out Options | Scaling Postgres 370

[00:00:00] One viable method of scaling your database is to do scale out. [00:00:05] So a lot of times this may mean setting up replicas and sending read traffic to those replicas so you can answer queries and then save your primary database for most of your writes. But what if you then want to scale the writes? Well then a lot of times you're talking about sharding and there's a couple of different solutions you can use. [00:00:26] Sidus is the most well known one, but there's also some other products that are coming up, like PG Dog is a pooler that allows you to do sharding. I think maybe pgpool has some features in that vein as well. But we're going to be talking about a new option today coming from the MySQL world, but I hope you, your friends, family and co workers continue to do well. Our first piece of content is announcing multigrest the tests for Postgres. This is from Supabase.com and basically the test is a quote database clustering System for scaling MySQL and it adds sharding connection pooling, query routing, resiliency and failover and cloud native orchestration via Kubernetes and Sagoo. Originally built fitess and at YouTube and there's a YouTube video here from about 10 years ago describing Vitess and how it helped YouTube. But the plan is to create a new tool called multigres that works with postgres. So essentially it's a new proxy that sits in front of your postgres database or as you can tell, multiple postgres databases. So presumably it allows failover of different nodes as well as sharding your data to those different nodes. And the goal for this project is to provide a gradual on ramp. In other words provide simple connection pooling on the low end, high availability on the top end, all the way to a sharding solution as you grow. So I personally find this very interesting and I can't wait to see what becomes of it. But if you want to learn more, feel free to check out this blog post. [00:02:07] Next piece of content announcing open sourcing PG active active active replication extension for PostgreSQL this is from aws.Amazon.com so apparently they've open sourced PG active. So essentially this is like multimaster replication. [00:02:24] So I was trying to get more details about it and clicked on the GitHub link and then clicked on the documentation link and it does give you a good overview of its design and how a lot of times these multimaster replication systems are developed on A regional basis because the consistency is really high in a given database. But there's definitely loose consistency between the different postgres instances in a replication group. So basically the changes are asynchronous. So changes are made in region B and then eventually they get replicated to region C and region A. So it uses essentially eventual consistency, but there is high consistency for for a given postgres instance that's part of the active active group. They are using logical replication essentially to replicate the data between one another. And as with most logical replication, it looks like DDL replication is disabled by default, but it looks like there's some configuration, so maybe it's possible to replicate it. And the other thing I thought was interesting is it offers global sequences. So most sequences of course are internal to a given postgres database. But this utilizes the concept of a global sequence that appends a unique node ID and a timestamp to the sequence. So the identifier is much longer, but it definitely makes it a globally unique sequence for each of the instances in an active active configuration. [00:03:54] And because it's eventually consistent, conflicts can happen and they talk about how you can avoid it and different ways to log log it. But feel free to check out this now open source project. [00:04:06] Next piece of content. First steps with logical replication in PostgreSQL this is from boringsql.com and this is a great post describing logical replication in different use cases. The first thing they mentioned is the difference between physical replication and logical replication. Basically, physical replication is where you're streaming the wall from a primary to a replica to keep it in sync. And logical is basically you're sending the logical changes to the row level changes to each database. But in terms of the data streamed, physical is a binary wall segments that are sent, whereas there are row level changes sent in logical replication. In terms of the scope, physical is a byte for byte stream. Everything is as identical as it can make it. Whereas with logical you can actually pick and choose what tables or even rows within tables you want to send. In terms of the node type, physical is only a read only standby. You can't write to them. Whereas in logical replication, it's a fully writable instance as well. In terms of postgres versions, with physical replication, all the servers have to match, whereas with logical replication you can do it across different versions. In terms of the database schema, everything moves in lockstep with physical replication, but with logical replication, you actually have to apply those separately to the subscriber. And in terms of use cases, physical replication is good for doing failover high availability. You can also do some read scaling, whereas with logical replication it's good if you want to copy data across, you can use it as a method for zero downtime upgrades, et cetera. Then they go through the process of setting up logical replication. So on a publisher you need to set the wall level to logical and make sure you have enough replication slots and wall sender slots to set up that replication. And you set up a user for the subscriber to connect and then you prepare the subscriber in terms of creating the database, creating the table, and then creating the subscription to start the logical replication. They talk a little bit about the replication slots which are important for setting up logical replication. They talk a little bit about replication identity because how a table is configured determines what kind of changes can be sent via logical replication. So for example, you're not going to get updates and deletes if you have replica identity set to none. For example, they talk about schema changes and the fact that you need to keep the schemas in sync between a publisher and a subscriber to make sure the data transfers still happen. And also some issues dealing with logical replication while it's running. [00:06:35] So if you want to learn more about that, definitely check out this blog post. [00:06:39] Next piece of content this week was poset so when I'm recording this today, it's actually the last day of it but but by the time you see this, all the videos should be live and ready for you to view and you can just click on the four different live streams that happen and they have all the different presentations that you can look at. So definitely a ton of video content. To learn more about postgres if you're interested, there's also this link to watch all 42 talks that takes you to a playlist as well. [00:07:08] Next piece of content There was another episode of Postgres FM last week. This one is on what to log and Michael and Nikolai go over all the different settings they think you should go ahead and set to log your database. The first one they talked about is logging checkpoints so that you can make sure to know when they're happening. Also setting log auto, vacuum and duration so that you can make sure your vacuums are being logged so you can check those entries in case of issues. [00:07:34] Next is log statement and next recommendation is to log ddl. Essentially you can log more if you want to, but usually at least DDL unless you have tons of temp tables being created for some reason, log connections and disconnections. This recommendation seemed not too sure. I mean, it's beneficial to have it, but depending upon the application that's connecting to the database, it can cause a lot of noise. Like I've actually had to turn it off for some clients because they weren't using a pooler and didn't necessarily want to use a pooler. So the logs were full of connection and disconnection requests. Next recommendation is to log lock weights. This is very important to track long locks that are happening. Next is to log the temp file so that you know, if you're for example, sorts or spilling to disk, this could be important to check. And the next recommendations are related to slow query logs and basically setting the logmin duration statement to some value and then optionally taking a sample at a given sample rate to capture slow queries as well. They did have a discussion about the log file itself and I think Nick actually thought it would be pretty great if we could split logs for different purposes, like generate different log files for a slow query log, whereas other logging happens in essentially the main log. [00:08:54] I know my personal favorite way of parsing through logs is just using grep. [00:09:00] I've tried using various kinds of on tools and I still find it faster to grep through the logs to get the answer that I need. They did also talk about auto explain and how that could potentially serve as a replacement for the slow query logs because it can trigger to print out the slow query as well as the query plan of it as well, because that can be highly beneficial to diagnose performance problems. But if, if you want to learn more, you can definitely listen to the episode up here or watch the YouTube video down here. Next piece of content. Understanding high watermark locking issues in PostgreSQL vacuums. This is from ShaneBorden.com and he had a customer he was working with where there were some select queries that were being hung up by an exclusive lock. And what was causing the exclusive lock was autovacuum. And this is basically one of the last steps of the vacuum process or the autovacuum process is where it tries to truncate free pages at the end of the table to return space back to the operating system essentially. And that's essentially where he says here the truncation of the high watermark. So there's no use space within here. And vacuum says, all right, let me go ahead and truncate this file because it's not being used, but in some cases, in some query patterns, it can be hard to Get a very short lock to do this truncation and it can cause locking issues. As they were experiencing here, basically selects were hanging up. And how they got around it is they actually configured the table not to do a vacuum truncate. So they said alter table whatever the table set vacuum truncate equal to false. So now it will not do that phase on this table. And once they set the setting there were no select statements that were being blocked. Now you can now choose to run manually a truncate if you ever want to do it by running vacuum truncate that particular table. And here they're also adding verbose, which is a good idea too whenever you're running it interactively at least or want to log the most information about it. [00:11:06] And they do have a recommendation of adding the PG free space extension because that helps you understand if the high water mark is growing or staying put. Because if it's growing you still want to truncate relatively often, where if it's not really growing you probably wouldn't need to truncate that often. But they have a convenient script that they put together here to be able to track how much that high water mark is staying stable or actually growing. So if you want to learn more, definitely check out this blog post. [00:11:34] Next piece of content Core database schema design Constraint driven, optimized, responsive and efficient. This is from andyinson.com and he actually came up with a mnemonic acronym CORE to describe how he thinks about database schema design. Basically it should be constraint driven. Basically try to use all the constraint options available from data types and actual constraints on those data types. Next is optimized. Basically optimize the schema design as much as you can in terms of updating tables or potentially using views, I suppose applying the right indexes, et cetera. The next one is responsive, meaning whenever schema changes are needed, you are able to apply those in a responsive manner, ideally in a non blocking way. So this could be indexing concurrently or using a lock timeout. And these DDO changes are tracked in source code. And lastly efficient basically only store the data that you need to. And if your data becomes particularly huge, consider table partitioning. [00:12:37] So if you want to learn more about his core philosophy here, you can check out this blog post. [00:12:42] Next piece of content Using regular expressions and arrays in PostgreSQL. This is from cybertech postgresql.com and he thought this was a little bit of a novel way to use regular expressions and arrays together. [00:12:54] So we showed an example of an array and how you can use any to identify a single value within an array, but you can also use all to say does this entirely match all the items in the array? And of course an array from 1 to 5 doesn't have all three, so it returns false in this case. But he says you can also use regular expressions and combine those together in innovative ways. Now I'm not quite sure of the use case he was going for with this, but it is definitely a different way to query data. But if you want to learn more, check out this blog post and now it's time for my Consulting corner. So apparently this is the Summer of Upgrades because because I'm actually working with three different clients doing postgres upgrades now. I still am doing a fair number of performance optimizations, but it definitely feels like the Summer of Upgrades Right now. [00:13:48] One client we're just migrating from their current version up to version Postgres 17. So using a logical replication upgrade for that, another client is on RDS and the plan is to use a blue green deployment to do an upgrade there. And then a third client is actually maybe not so much an upgrade, but it's a migration into rds. Unlike the episode I talked about last week where Nikolai was proposing moving out of rds, this is someone moving into rds, but then once they've migrated there, the plan is to upgrade to a more recent version of postgres as well. So unfortunately this week I don't have anything great and insightful to say. I'm just giving an update of kind of what's been going on, and I hope your summer is going fairly well. [00:14:33] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links to all the content that was mentioned, as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show, as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 147

DBMS of the Year, Better Data Migrations, Idle Connection Impact, Security Implementation Guide | Scaling Postgres 147

Episode 163

SQL Data Analysis, pgBouncer Setup, Understanding Deadlocks, Tuning & Optimization | Scaling Postgres 163

Episode 13

Sharding Future, Query Optimization, Replication Read Performance, PostGIS | Scaling Postgres 13