Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about sharding federation, Linux, huge pages and performance tuning. I'm Kristen Jameson, and this is scaling postgres episode 187.
[00:00:22] All right, I hope you, your friends, family and co workers continue to do well. Our first piece of content is herding elephants. Lessons learned from Sharding postgres at Notion. This is from Notion. So and they're talking about sharding their postgres monoliths. So they did have a monolithic database, one single database running their application, and they decided to shard it into multiple databases, more specifically, 32 physical separate databases. So they talk about the process they went through and their experience with it. They store information from customers and they use a concept of blocks, files and spaces to store them. So the first decision was how they wanted to shard it. They looked at a few different technologies such as Citus for postgres, which is the well known extension for scaling out postgres across multiple instances. They also looked at Fittest for MySQL, but they weren't necessarily happy with the clustering logic and they said they wanted more control over the distribution of their data. So what they actually decided to do is an application level sharding. So basically the application needs to know what shard to go to to get the right data to deliver it to the customer. Now, they basically decided to shard it at the block level and everything that is connected to it. So it looks like spaces or workspaces are associated with blocks, discussions are comments are so all of those table, they moved into this sharded database solution and they decided to partition it by their workspace ID. I'm assuming this is similar to either an account ID or a customer ID or maybe one level down more granular, like maybe a customer can have more than one workspace ID. But that's basically how they decided to partition it and they wanted to plan ahead to have sufficient capability to scale out in the future. And what they decided on, as I mentioned, is 32 physical databases at this point, but they actually created 480 shards. Now, the reason they did this is because of how many factors get to 480. Like all of these different numbers are a factor of 480. So it gives them a lot of choices on how to split out their sharding into however many databases. And here's an example. Here they went from this monolithic structure to 32 physical databases. Each physical database has 15 logical shards and it looks like they sharded it based upon the schema. So there's essentially 15 schemas in each of the physical databases. Now, one of the main reasons they said that they started exploring sharding is because they were having issues with auto vacuum keeping up as well as having a risk of hitting the TXID wraparound. But they don't really mention it up here about using partitioning because partitioning is a way to break up your tables so that they can be vacuumed faster and ideally avoid the TXID wraparound. Now, that's not to say they would eventually have to Shard anyway, but I wonder if they actually considered partitioning first and then if that got to a point where it didn't work all that well, then they fell back to Sharding, because again, Sharding is a lot of work to do. They did mention partitioning when they were deciding what they were doing for Sharding and they decided not to do any partitioning at all. Basically the application decides what exact schema and table to contact. But even before Sharding, I wonder if the partition solution was explored at all. I didn't see that mentioned here. And then this blog post goes into great detail about how they actually did the migration. So at first doing double writes to both systems, doing a backfill of the Sharded system, doing a verification process, including interesting things like having separate staff members do the verification versus who wrote the migration or the backfill to ensure there wasn't just copying code over for the verification. In other words, it was an independent verification. And then of course, planning the switchover process.
[00:04:34] And they talked about the way that they kind of synced up the data between it and they were considering writing paths to both databases, but they were a little bit concerned about inconsistencies coming up with that. They considered logical replication, but ultimately it looks like they decided against that because they were having problems with the initial snapshot step. But interestingly, postgres 14, that's one of its advantages is it really sped some of this up. Now that's not to say it wouldn't have worked in other ways, but that's interesting thing to think about. And then the third way that they ultimately went with was an audit log with a catch up script to basically capture everything in a log and then send it to the Sharded database to catch it up. But this is a great post that goes into a lot of detail of how they decided to go with application level Sharding versus some other solution for their database growth. So I definitely encourage you to check this out.
[00:05:29] The next piece of content multicluster postgres federation on Kubernetes with Postgres 14 using git ops. This is from Crunchydata.com and this looks to be basically a refresh of a post that was done previously linked here called Active, active Federated Postgres Cluster on Kubernetes. So it essentially uses the newer version of the Postgres Kubernetes operator that they developed along with using some Postgres 14 specifics. But essentially the post is the same where you have three regions, an east region, a central region and a west region. Each database writes locally and then sets up logical replication to copy those changes to each of the other databases in the cluster, essentially. So something would be written to the central cluster locally, but then logical replication would send it over to east and west and that these are partitioned so there's no chance of a conflict between things. So for example, this is part of the setup for the postcards operator. But here they get into creating the structure and essentially you have one hippo's parent table, but then you have partitions for each area east, central, west and default. And basically data gets updated or inserted and it gets placed into the default table. But then a function runs as a trigger that then moves it essentially into the appropriate partition. So this is a pretty interesting technique and you don't need to use Kubernetes for this and you don't need to use Postgres 14. Basically you need to use a version that supports logical replication and you can implement this. So it is a pretty interesting technique. If you want to learn more along with doing it with Kubernetes, you can definitely check out this blog post.
[00:07:14] The next piece of Content why Linux huge pages are super important for database servers a case with PostgreSQL this is from Procona.com and they're targeting the solution for these Linux huge pages as a way to avoid out of memory errors in postgres. Now of course there's a lot of other benefit in terms of using less memory, which may have advantages for caching and things of that nature, but they go through and demonstrate the problem here. So they have a particular size system with 192 memory and 40 CPU cores. They show the configuration here and they start running PG bench. And you can look at the memory as their PG bench, or I should say Sysbench is running. And you can see that the memory goes from 71gb available down to 1.5gb and then it actually starts using swap space at that point and that the page table size has grown to 25GB from an initial size of 45 megabytes. So that's pretty huge. Now this also makes some of these processes pretty large, which makes them target for the out of memory killer if it's going to kill something. But they're using huge pages as a potential solution to this. So first they find out what the VM peak for the postgres process is to be able to configure huge pages to an appropriate size and then they go ahead and set it up. They go ahead and restart the database and they can tell from stopping and starting the database there's not too much of a difference between the memory. But then once he starts running this for many hours, you can see it starts at 38gb and then it ends still at around 38 39gb. So it's not using all of that memory. And he says you can see that the page table size remains the same. So it's around 62 megabytes as opposed to getting up to 25GB without using Linux huge pages. So this is a pretty impressive improvement in memory savings that can be used for other things as well as hopefully reducing the probability that the outer memory killer would be used in Linux. So definitely a piece of content. I highly suggest you check out next Piece of Content webinar Recap Tuning Tips to Maximize Postgres Performance so this is from Enterprisedb.com and there is a Linked YouTube video here that you can take a look at. It's about an hour in length, but it goes through all sorts of performance improvements or things you should look at when tuning postgres talking about hardware, talking about VM settings, talking about the operating system configuration, talking about a number of different postgres parameters you can set, and finally ending up with query optimization. So this was a great piece of content to keep track of when you need to optimize your postgres installation.
[00:10:05] The next piece of content tune sorting operations in PostgreSQL with workmem this is from AWS Amazon.com. They're talking about configuring workmem and how to find an appropriate setting for it so that it's not using so much disk. So more workmem allows memory to be used for more sorting operations. So if you have workmem set too low, he shows some examples where you're producing more temporary files. But the thing that I tend to look at in an Explain analyze plan is the fact that you're getting external merge on the disk. So this can be an indication that your work mem is too low. It may also mean that you need an index. And he does cover this in here too. There could be a query optimization or index optimizations that would be more ideal than altering work mem. And you also need to be careful of setting work mem too high. Like for example, he set it here to 1.4gb, so you definitely wouldn't want to set that on the database configuration itself. He set it actually in his individual session that he was connected as to do the query. So if you do need to increase workmen for certain queries, this could be a good solution to set it per session. But generally you want to avoid setting it too high and using all the memory of your system in Postgres for typical connections that don't need a lot of performance. And he does cover some of this in the post below. So if you want to learn more details about setting workmem for your system, definitely check out this blog. Post next Piece of Content show Tables in PostgreSQL what's wrong with it? This is from CyberTech postgresql.com. They're talking about show Tables, which is a command you can use in MySQL to look at all the different tables in a database. Well, Postgres doesn't have this because it's not an ANSI or SQL standard way of doing it. Postgres has a couple of different ways that you can do it though. The main way that I use that he mentions first here is being able to use the psql client because there you can use the Slash commands to get a list of different tables. So you can use Backslash DT to get a list of the different tables, or you can look at them from a particular schema or for a particular table or even get extended details about them. You can also always go to the System tables by doing a SQL query to either the PG class table or even the PG tables. And then lastly, there's also the ANSI standard way there's an Information schema that you can query. And looking at the Information Schema Tables View or Table, you can get information about all the tables of the system. So these are some different ways that you can query this schema of your database using Postgres next piece of content using Query ID in Postgres 14 this is from Rustprooflabs.com, and they're talking about a new enhancement to Postgres 14 where they have made the query ID more visible and in more places of the system. And essentially you can think of this as a Uuid or universally unique ID for queries. So you can identify queries across PG Stat statements or across PG Stat Activity to trace the behavior and activity of certain queries. So if you want to learn more about how you could potentially do that and even log it, you can check out this blog post.
[00:13:26] The next piece of content backup label in PostgreSQL. This is from Haigo, CA. This explains the Backup label file and how it works in PostgreSQL specifically for doing usually exclusive backups. So you would run a PG Start backup and a PG Stop backup and then backup the data files in between that. So it talks about the purpose of the backup label and how it's used. So if you want to learn more, you can check out this blog post.
[00:13:51] The next post is a short one on multirange types in PostgreSQL 14. This is from postgresonline.com. They explain how you can use multirange types in postgres 14 and how you can use different functions like Array aggregate or array AGG as well as unnest to be able to present different multirange types. The next piece of content installing PostgreSQL on OpenBSD and this is from fluca 1978 GitHub IO. He's talking about installing Postgres on OpenBSD, so if you have an interest in doing that, you can check out this blog post.
[00:14:29] And the last piece of content the PostgreSQL person of the week is Sebastian Maneem. So if you're interested in learning more about Sebastian and his contributions to Postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the Show Notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks you.