Microsoft Acquires Citus Data, Split WAL, Maintenance Work Mem

[00:00:00] In this episode of Scaling Postgres, we talk about Microsoft acquiring Citus Data, split wall files, maintenance work, memory, and logical replication. I'm creston. Jameson. And this is scaling postgres episode 48. [00:00:22] Alright, the first big news of the week is that Microsoft has acquired Citus Data. And this comes, I've got two posts related to this. One is from the Microsoft.com blog microsoft Acquires Cytus Data, reaffirming its commitment to open source and accelerating Azure PostgreSQL performance and scale. So this was pretty surprising. So recently Microsoft acquired GitHub, which is known for supporting a lot of open source work, and now they have acquired Cytus Data, which develops a postgres extension to horizontally scale your PostgreSQL database. So this is an interesting move by Microsoft and it probably conveys that PostgreSQL on their cloud platform is pretty popular because this solution, they're able to offer a high degree of scaling as a part of their service too. So if you're interested, definitely check out this blog post. The companion piece for the Citus Data report on it is located here on this Citus Data blog. Microsoft acquires Citus data, creating the world's best postgres experience together. Now, another interesting thing that I wanted to mention in postgres open Silicon Valley 2018, there was a talk called Lessons from Building Venice. DB. [00:01:44] Actually, Venicedb is a project at Microsoft that uses the Citis Data extension in postgres to do all of their analytics, I believe with regard to the Microsoft Windows patching system that patches all burn sessions of Microsoft Windows in the wild. So this solution they use, this site definitely has to be a huge installation given how many devices that are out there using Microsoft Windows. So I wonder if some of this is related to part of their acquisition plans. But definitely an interesting piece of news that happened this week. [00:02:20] The next post is The Curious Case of Split Wall Files. And this is from Richyen.com, and he's talking about a situation where you're attempting to do a restore command and you're running into a fatal error for a particular wall file. Since the fatal could not retrieve data from the wall stream, there's an error and the requested wall segment has already been removed. Now he says that even though it's reporting this, the file is there. So in the archive directory and LS shows that it's in the archive. So it never actually progresses in the replication effort. Now he did mention he's only witnessed this twice in the past eight years of using streaming replication. So this is a rare event, but from what he's discovered, it happens when a wall entry is split across two wall files because some wall entries will span two files. And that the postgres archive replay doesn't really internally know that it needs both files in order to do the restore in this event. Now he goes into some of the detail in here, so I encourage you to read the blog post. One solution that he had is that there was actually a low volume replication with the database. In a PG switch XLOG to you switch to the next log file was able to resolve it. Although this may not work in all cases, but one thing that should work in most of the cases is actually using a replication slot. That way postgres will always hold on to the necessary wall files and doesn't move or delete them prematurely. So this is a rare occurrence where something like this can happen, but definitely a blog post to check out. Just to keep in mind in case you ever run into this situation. [00:04:04] The next post is how much maintenance work mem do I need? And this is from the Robert Haas blogspot.com site and basically his too long didn't read is try Maintenance work memory at 1GB and read on for more specific advice. I always enjoy his blog posts because they're very thorough and go into a lot of explanations. So I definitely encourage you to read the blog post in its entirety. Basically, to summarize some quick bits of it is that maintenance work memory is mostly used for two purposes. One, he says it controls the maximum amount of memory that the system will use when building an index, otherwise it's going to spill it to a disk. Now, he mentions here that's not usually a big problem, but that's one of its use cases. The second use is it controls the amount of memory that vacuum will allocate to store the TIDs of dead index tuples. So you kind of want this sufficiently large to handle large tables because, and I quote if it runs out of memory to store the TIDs of dead index tuples before it scans the whole table, it will stop the table. Scan scan the indexes discard, the accumulated list of TIDs, and then resume scanning the table from the point at which it left off. So the problem is, if you don't have enough memory to hold all those dead tuples since the last vacuum, it's going to have to scan indexes multiple times. So if you have to do two times, three times, and they're large tables and a lot of indexes, this could really extend the amount of time that auto vacuum takes to complete a whole vacuum of a table. So you definitely want it sufficiently sized to avoid that. But of course, he says on the other hand, don't make it too large that you run out of memory because every auto vacuum worker you're using will use that amount of maintenance work memory. So in his examples, if you have three auto vacuum workers and you set it to 10GB, you're using 30GB of Ram only for maintenance work memory. So it's probably a scenario you wouldn't want to do. So again, a lot of great information in this post and I definitely suggest checking it out so you can better optimize your setting for maintenance workmem the next post is how we solved a storage problem in PostgreSQL without adding a single byte of storage. And this is from Hacky Benita, I believe. So basically they had a query that was actually using a ton of disk space when it ran. And the reason why it was using a ton of disk space is because it was doing sorting and doing an external merge to disk. Now it was so large because they were actually doing a partition by over a blob, which should be fairly large in size. And this process used a lot of disk storage that exceeded the work memory setting. So they kind of put their heads together and said how can I reduce the size of this? So they looked at the column size for the blob, the average column size for the blob, and it was about 780. So they said maybe we can do an MD Five of it, do a hash of it to reduce the size and they got it down to an average size of 36 and that was enabled them to do a sort method in memory. So an in memory quicksort. But they said let's see if we can get it to go down further. So then they actually used the PG Crypto extension to produce an MD five as a binary type that got it down smaller. But then finally they did it as a Uuid type and they got it down to 16 bytes. So basically they show here how much disk was used for the sort operations by hashing the values in order to do the sort. Now they said this of course is going to burn a lot of CPU doing the hashing operations, so it did slow this query down. So against a blob the runtime was 160 milliseconds versus the hash blob was 374 milliseconds. But compared to the disk space saved and because this is a nightly job, it was a no brainer to go ahead and use those CPU resources to do the hash to make sure that a less disk space was being used. So definitely an interesting technique I haven't really seen before and I encourage you to check it out if you're interested. [00:08:22] The next post is PostgreSQL logical Replication. Gotchas. And this is from the PG IO blog. And they start off talking about streaming replication, which I usually consider physical streaming replication, basically copying the wall files exactly from one system to another. And then they talk about what is logical replication. So it's basically identifying specific tables that you want to logically replicate the changes from one database system to another. So they show an example of you create a table, insert some values into it, and then you create a publication on that table in another database. You create the table you want it to fill into and you create a subscription that essentially subscribes to that publisher and then any data changes that are produced in the source get replicated to the destination. But what this post is concentrating on is some of the gotchas you have to be aware of. So the first one they mentioned here is that the replication of schema changes. So basically schema changes aren't really replicated over so you need a way to potentially work around that. Now they're mentioning using a Django application but I'm sure this would be applicable for any other application accessing the database and their workaround suggestion here is to disable the subscription, do the schema change to the source database and the destination database and then enable the subscription. Then they also mentioned if new tables are added to your publication and your publication wasn't set for all tables, in other words you only did specific ones, you'll need to add them manual to the publication as well and they give the command on how to do that. And you'll also need to do a refresh subscription command as well. The next gotcha are sequences. Basically sequence numbers are not passed from the source database to the destination database. So this will be a gotcha if you're actually inserting data into the destination database as well as Replicating from a source because you could get sequences colliding with one another. Now if you have that scenario, the two workarounds they mentioned are use an external source for the number like zookeeper or etcd. Or use non overlapping ranges or you could potentially even use UUIDs to avoid those types of collisions. They mentioned another gotcha is tables without unique rows and for that workaround they basically say to set replica identity to full. So basically all the changes are being sent to the replication destination and they also go into differently partitioned destinations which that seems like a pretty unique use case. So I don't believe there's any support for this type of replication but if you're using logical replication or planning to definitely a blog post to check out to see what kind of gotchas you can be aware of and potentially avoid. [00:11:13] The next post is PG Permission inspecting your PostgreSQL security system. This is from the CyberTech Postgresql.com blog and basically they have created an open source tool called PG Permission as an extension. So basically it allows you to, as they say here, gain a faster overview and list all permissions. So it's a way to more efficiently, they believe, list all the partitions that a particular role has as well compare your desired state to what the current state is and instantly fix errors. So this view also lets you look at table Permissions and View Permissions column permissions so all sorts of different permissions and you can actually set permission targets so that you can determine kind of where is it different from what you're trying to target. So once you start getting more users using PostgreSQL and need a permission system that's a little bit more sophisticated, this is one open source tool that can potentially help with that so basically you update views and PG permission will execute the desired grant and revokes based upon your settings. So if you're interested in doing that, definitely a blog post and a tool to check out the next post is scheduling backups en masse with the postgres operator. This is from the Crunchydata.com blog. So Postgres Operator is a tool that's been developed by Crunchy Data that's built for managing multiple databases in a Kubernetes environment. So this tool, one of the jobs it helps you do is manage the backups. If you have tens or maybe even hundreds of PostgreSQL databases in Kubernetes, this is a tool that helps you schedule the backups and do some other responsibilities. I believe if it's using since it's called Operator now, there's a companion blog post also called What's New in Crunchy PostgreSQL Operator 3.5. So it talks about some new capabilities with PG backgrounds because I believe that's the backup tool that this uses for backing up all of the different instances in a Kubernetes cluster. So if you're using PostgreSQL and Kubernetes, maybe you want to check out this open source tool called Operator. [00:13:29] The last piece of content is actually a YouTube channel. So the second quadrant PostgreSQL YouTube channel actually posted three videos this week, but they are a little bit older. I believe this one is about two years old because they're talking about the PostgreSQL Ten roadmap. These last two were done about the last six months. This one talks about PostgreSQL Eleven and the new features. So if you want a review of that, this would be a good presentation to watch. And then this presentation. Next generation PostgreSQL replication. This covers their BDR product, their bi directional replication, or their Master to Master Replication. The only downside of this particular video is the audio wasn't great quality. But if you're interested in Master Master Replication, this would be definitely a presentation to check out. [00:14:18] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you could subscribe via YouTube or bye bye. What's?

Show Notes

Episode Transcript

Other Episodes

Episode 162

Custom Data Types, Row Level Triggers, Dynamic pl/pgsql Columns, citext Usage | Scaling Postgres 162

Episode 165

Postgres Releases, Data Warehouses, Exclusion Operators, Read-Only Users | Scaling Postgres 165

Episode 10

ANSI Schmansi, Split-Brain, Performance Scenarios, Parallelism | Scaling Postgres 10

Microsoft Acquires Citus Data, Split WAL, Maintenance Work Mem | Scaling Postgres 48